{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6d2b5335",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/low_level/fusion_retriever.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0b5c122-d577-4045-b980-cab2eae2aa0c",
   "metadata": {},
   "source": [
    "# Building an Advanced Fusion Retriever from Scratch\n",
    "\n",
    "In this tutorial, we show you how to build an advanced retriever from scratch.\n",
    "\n",
    "Specifically, we show you how to build our `QueryFusionRetriever` from scratch.\n",
    "\n",
    "This is heavily inspired from the RAG-fusion repo here: https://github.com/Raudaschl/rag-fusion."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0d82203e-1aa0-4d85-8a0f-3854dfa81494",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "We load documents and build a simple vector index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9bafe694",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index-readers-file\n",
    "%pip install llama-index-llms-openai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4e54ab99-cf6b-4ca0-9a48-29425fdb18b7",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install rank-bm25 pymupdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c79e8b40-c963-46ee-9601-6c31e5901568",
   "metadata": {},
   "outputs": [],
   "source": [
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41d6148f-3185-4a32-973c-316f23e45804",
   "metadata": {},
   "source": [
    "#### Load Documents"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c054a492-56c9-4dae-bede-06739858ba57",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir data\n",
    "!wget --user-agent \"Mozilla\" \"https://arxiv.org/pdf/2307.09288.pdf\" -O \"data/llama2.pdf\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1126a6d3",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0f03cf99",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b3b7ec9e-30cf-49ba-9b3b-9beb9a2b6758",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "from llama_index.readers.file import PyMuPDFReader\n",
    "\n",
    "loader = PyMuPDFReader()\n",
    "documents = loader.load(file_path=\"./data/llama2.pdf\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f59a6cd1-802a-4e69-afd1-de6faf4b064b",
   "metadata": {},
   "source": [
    "#### Load into Vector Store"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b2423b4b-5c3b-4d36-b338-9bf74f0e6a82",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import VectorStoreIndex\n",
    "from llama_index.core.node_parser import SentenceSplitter\n",
    "\n",
    "splitter = SentenceSplitter(chunk_size=1024)\n",
    "index = VectorStoreIndex.from_documents(documents, transformations=[splitter])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1145cee6-11b3-4ac1-a1a2-35ff549e7bb2",
   "metadata": {},
   "source": [
    "#### Define LLMs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2c29d22c-ccd0-4db5-a78b-d6d3138b39ef",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.llms.openai import OpenAI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "483d46c5-fdd1-4c5c-b06f-292442289c40",
   "metadata": {},
   "outputs": [],
   "source": [
    "llm = OpenAI(model=\"gpt-3.5-turbo\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cbc7a9bf-0ef7-45a5-bc76-3c37a8a09c88",
   "metadata": {},
   "source": [
    "## Define Advanced Retriever\n",
    "\n",
    "We define an advanced retriever that performs the following steps:\n",
    "1. Query generation/rewriting: generate multiple queries given the original user query\n",
    "2. Perform retrieval for each query over an ensemble of retrievers.\n",
    "3. Reranking/fusion: fuse results from all queries, and apply a reranking step to \"fuse\" the top relevant results!\n",
    "\n",
    "Then in the next section we'll plug this into our response synthesis module."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3586a793-3c5a-4d7c-b401-cd6fb71f87a1",
   "metadata": {},
   "source": [
    "### Step 1: Query Generation/Rewriting\n",
    "\n",
    "The first step is to generate queries from the original query to better match the query intent, and increase precision/recall of the retrieved results. For instance, we might be able to rewrite the query into smaller queries.\n",
    "\n",
    "We can do this by prompting ChatGPT."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a5183a0-58ce-4cc5-a74b-8428dfe12bb5",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import PromptTemplate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9e745f03-5c06-43b5-ad9d-65c5b8150ba7",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_str = \"How do the models developed in this work compare to open-source chat models based on the benchmarks tested?\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a17441d-1f14-4de4-a4b3-55177d0a2dee",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_gen_prompt_str = (\n",
    "    \"You are a helpful assistant that generates multiple search queries based on a \"\n",
    "    \"single input query. Generate {num_queries} search queries, one on each line, \"\n",
    "    \"related to the following input query:\\n\"\n",
    "    \"Query: {query}\\n\"\n",
    "    \"Queries:\\n\"\n",
    ")\n",
    "query_gen_prompt = PromptTemplate(query_gen_prompt_str)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c3a7b04-c4fb-456c-8584-5ea93bdc7bf0",
   "metadata": {},
   "outputs": [],
   "source": [
    "def generate_queries(llm, query_str: str, num_queries: int = 4):\n",
    "    fmt_prompt = query_gen_prompt.format(\n",
    "        num_queries=num_queries - 1, query=query_str\n",
    "    )\n",
    "    response = llm.complete(fmt_prompt)\n",
    "    queries = response.text.split(\"\\n\")\n",
    "    return queries"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a577a95-2b58-424d-aa7d-bed9aa9ceb98",
   "metadata": {},
   "outputs": [],
   "source": [
    "queries = generate_queries(llm, query_str, num_queries=4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "73fe53d2-c556-44ed-a255-374ae8eca494",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['1. What are the benchmarks used to evaluate open-source chat models?', '2. Can you provide a comparison between the models developed in this work and existing open-source chat models?', '3. Are there any notable differences in performance between the models developed in this work and open-source chat models based on the benchmarks tested?']\n"
     ]
    }
   ],
   "source": [
    "print(queries)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9876c65-1a90-4606-9104-5df10009d935",
   "metadata": {},
   "source": [
    "### Step 2: Perform Vector Search for Each Query\n",
    "\n",
    "Now we run retrieval for each query. This means that we fetch the top-k most relevant results from each vector store.\n",
    "\n",
    "**NOTE**: We can also have multiple retrievers. Then the total number of queries we run is N*M, where N is number of retrievers and M is number of generated queries. Hence there will also be N*M retrieved lists.\n",
    "\n",
    "Here we'll use the retriever provided from our vector store. If you want to see how to build this from scratch please see [our tutorial on this](https://docs.llamaindex.ai/en/latest/examples/low_level/retrieval.html#put-this-into-a-retriever)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53114651-0b57-4cbc-a07f-d906c5820cb7",
   "metadata": {},
   "outputs": [],
   "source": [
    "from tqdm.asyncio import tqdm\n",
    "\n",
    "\n",
    "async def run_queries(queries, retrievers):\n",
    "    \"\"\"Run queries against retrievers.\"\"\"\n",
    "    tasks = []\n",
    "    for query in queries:\n",
    "        for i, retriever in enumerate(retrievers):\n",
    "            tasks.append(retriever.aretrieve(query))\n",
    "\n",
    "    task_results = await tqdm.gather(*tasks)\n",
    "\n",
    "    results_dict = {}\n",
    "    for i, (query, query_result) in enumerate(zip(queries, task_results)):\n",
    "        results_dict[(query, i)] = query_result\n",
    "\n",
    "    return results_dict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d046d284-ab9e-4242-b91a-8d06c472dfaf",
   "metadata": {},
   "outputs": [],
   "source": [
    "# get retrievers\n",
    "from llama_index.core.retrievers import BM25Retriever\n",
    "\n",
    "\n",
    "## vector retriever\n",
    "vector_retriever = index.as_retriever(similarity_top_k=2)\n",
    "\n",
    "## bm25 retriever\n",
    "bm25_retriever = BM25Retriever.from_defaults(\n",
    "    docstore=index.docstore, similarity_top_k=2\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6a0ddc59-1602-4078-b3e9-dadb852709fc",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 22.59it/s]\n"
     ]
    }
   ],
   "source": [
    "results_dict = await run_queries(queries, [vector_retriever, bm25_retriever])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "65f13dff-e5fa-4c60-865c-7217dd4c87e6",
   "metadata": {},
   "source": [
    "### Step 3: Perform Fusion\n",
    "\n",
    "The next step here is to perform fusion: combining the results from several retrievers into one and re-ranking.\n",
    "\n",
    "Note that a given node might be retrieved multiple times from different retrievers, so there needs to be a way to de-dup and rerank the node given the multiple retrievals.\n",
    "\n",
    "We'll show you how to perform \"reciprocal rank fusion\": for each node, add up its reciprocal rank in every list where it's retrieved.\n",
    "\n",
    "Then reorder nodes by highest score to least.\n",
    "\n",
    "Full paper here: https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a4702259-ea82-4e08-b64c-1300a3857c63",
   "metadata": {},
   "outputs": [],
   "source": [
    "def fuse_results(results_dict, similarity_top_k: int = 2):\n",
    "    \"\"\"Fuse results.\"\"\"\n",
    "    k = 60.0  # `k` is a parameter used to control the impact of outlier rankings.\n",
    "    fused_scores = {}\n",
    "    text_to_node = {}\n",
    "\n",
    "    # compute reciprocal rank scores\n",
    "    for nodes_with_scores in results_dict.values():\n",
    "        for rank, node_with_score in enumerate(\n",
    "            sorted(\n",
    "                nodes_with_scores, key=lambda x: x.score or 0.0, reverse=True\n",
    "            )\n",
    "        ):\n",
    "            text = node_with_score.node.get_content()\n",
    "            text_to_node[text] = node_with_score\n",
    "            if text not in fused_scores:\n",
    "                fused_scores[text] = 0.0\n",
    "            fused_scores[text] += 1.0 / (rank + k)\n",
    "\n",
    "    # sort results\n",
    "    reranked_results = dict(\n",
    "        sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)\n",
    "    )\n",
    "\n",
    "    # adjust node scores\n",
    "    reranked_nodes: List[NodeWithScore] = []\n",
    "    for text, score in reranked_results.items():\n",
    "        reranked_nodes.append(text_to_node[text])\n",
    "        reranked_nodes[-1].score = score\n",
    "\n",
    "    return reranked_nodes[:similarity_top_k]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "497cf22e-a5fe-4dcd-9227-13cf63b3622c",
   "metadata": {},
   "outputs": [],
   "source": [
    "final_results = fuse_results(results_dict)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87061aed-fbfb-4949-9fc6-f687050314d0",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/markdown": [
       "**Node ID:** d92e53b7-1f27-4129-8d5d-dd06638b1f2d<br>**Similarity:** 0.04972677595628415<br>**Text:** Figure 12: Human evaluation results for Llama 2-Chat models compared to open- and closed-source models\n",
       "across ~4,000 helpfulness prompts with three raters per prompt.\n",
       "The largest Llama 2-Chat model is competitive with ChatGPT. Llama 2-Chat 70B model has a win rate of\n",
       "36% and a tie rate of 31.5% relative to ChatGPT. Llama 2-Chat 70B model outperforms PaLM-bison chat\n",
       "model by a large percentage on our prompt set. More results and analysis is available in Section A.3.7.\n",
       "Inter-Rater Reliability (...<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/markdown": [
       "**Node ID:** 20d32df8-e16e-45fb-957a-e08175e188e8<br>**Similarity:** 0.016666666666666666<br>**Text:** Figure 1: Helpfulness human evaluation results for Llama\n",
       "2-Chat compared to other open-source and closed-source\n",
       "models. Human raters compared model generations on ~4k\n",
       "prompts consisting of both single and multi-turn prompts.\n",
       "The 95% confidence intervals for this evaluation are between\n",
       "1% and 2%. More details in Section 3.4.2. While reviewing\n",
       "these results, it is important to note that human evaluations\n",
       "can be noisy due to limitations of the prompt set, subjectivity\n",
       "of the review guidelines, s...<br>"
      ],
      "text/plain": [
       "<IPython.core.display.Markdown object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from llama_index.core.response.notebook_utils import display_source_node\n",
    "\n",
    "for n in final_results:\n",
    "    display_source_node(n, source_length=500)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8aa602b1-4188-48ae-9d6c-127778019261",
   "metadata": {},
   "source": [
    "**Analysis**: The above code has a few straightforward components.\n",
    "1. Go through each node in each retrieved list, and add it's reciprocal rank to the node's ID. The node's ID is the hash of it's text for dedup purposes.\n",
    "2. Sort results by highest-score to lowest.\n",
    "3. Adjust node scores."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5d92ae5d-1220-4988-998e-06a22cc4f7b2",
   "metadata": {},
   "source": [
    "## Plug into RetrieverQueryEngine\n",
    "\n",
    "Now we're ready to define this as a custom retriever, and plug it into our `RetrieverQueryEngine` (which does retrieval and synthesis)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a4ccd8d7-9fc5-463c-ac0c-f9e1d1bae63c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import QueryBundle\n",
    "from llama_index.core.retrievers import BaseRetriever\n",
    "from typing import Any, List\n",
    "from llama_index.core.schema import NodeWithScore\n",
    "\n",
    "\n",
    "class FusionRetriever(BaseRetriever):\n",
    "    \"\"\"Ensemble retriever with fusion.\"\"\"\n",
    "\n",
    "    def __init__(\n",
    "        self,\n",
    "        llm,\n",
    "        retrievers: List[BaseRetriever],\n",
    "        similarity_top_k: int = 2,\n",
    "    ) -> None:\n",
    "        \"\"\"Init params.\"\"\"\n",
    "        self._retrievers = retrievers\n",
    "        self._similarity_top_k = similarity_top_k\n",
    "        super().__init__()\n",
    "\n",
    "    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:\n",
    "        \"\"\"Retrieve.\"\"\"\n",
    "        queries = generate_queries(llm, query_str, num_queries=4)\n",
    "        results = run_queries(queries, [vector_retriever, bm25_retriever])\n",
    "        final_results = fuse_results(\n",
    "            results_dict, similarity_top_k=self._similarity_top_k\n",
    "        )\n",
    "\n",
    "        return final_results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b2b641b1-e64e-4ddf-9cc5-88ff5c57b70e",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core.query_engine import RetrieverQueryEngine\n",
    "\n",
    "fusion_retriever = FusionRetriever(\n",
    "    llm, [vector_retriever, bm25_retriever], similarity_top_k=2\n",
    ")\n",
    "\n",
    "query_engine = RetrieverQueryEngine(fusion_retriever)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c0d81b5b-39f2-42da-92b9-ce6113fa43d9",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/jerryliu/Programming/gpt_index/llama_index/indices/base_retriever.py:22: RuntimeWarning: coroutine 'run_queries' was never awaited\n",
      "  return self._retrieve(str_or_query_bundle)\n",
      "RuntimeWarning: Enable tracemalloc to get the object allocation traceback\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(query_str)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "93daeaaa-ce68-465a-b246-287714b4b370",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The models developed in this work, specifically the Llama 2-Chat models, are competitive with open-source chat models based on the benchmarks tested. The largest Llama 2-Chat model has a win rate of 36% and a tie rate of 31.5% relative to ChatGPT, which indicates that it performs well in comparison. Additionally, the Llama 2-Chat 70B model outperforms the PaLM-bison chat model by a large percentage on the prompt set used for evaluation. While it is important to note the limitations of the benchmarks and the subjective nature of human evaluations, the results suggest that the Llama 2-Chat models are on par with or even outperform open-source chat models in certain aspects.\n"
     ]
    }
   ],
   "source": [
    "print(str(response))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_index_v2",
   "language": "python",
   "name": "llama_index_v2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
